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Abstract 

Large dimensional random matrix theory (RMT) has provided an efficient analytical tool to under- 
stand multiple-input multiple-output (MIMO) channels and to aid the design of MIMO wireless commu- 
nication systems. However, previous studies based on large dimensional RMT rely on the assumption 
that the transmit correlation matrix is diagonal or the propagation channel matrix is Gaussian. There 
is an increasing interest in the channels where the transmit correlation matrices are generally nonnega- 
tive definite and the channel entries are non-Gaussian. This class of channel models appears in several 
applications in MIMO multiple access systems, such as small cell networks (SCNs). To address these 
problems, we use the generalized Lindeberg principle to show that the Stieltjes transforms of this class of 
random matrices with Gaussian or non-Gaussian independent entries coincide in the large dimensional 
regime. This result permits to derive the deterministic equivalents (e.g., the Stieltjes transform and the 
ergodic mutual information) for non-Gaussian MIMO channels from the known results developed for 
Gaussian MIMO channels, and is of great importance in characterizing the spectral efficiency of SCNs. 
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I. Introduction 



The seminal works by Foschini et al. [1] and Telatar [2] have inspired the world to realize the huge capac- 
ity of multiple-input multiple-output (MIMO) antenna systems and shed light on the capacity-achieving 
strategies of such systems. However, exact analysis for the achievable rates of MIMO channels could be dif- 
ficult and for some channel models unsolvable. In the last few years, large-system approaches have emerged 
as a means to circumvent the mathematical difficulties, greatly motivated by the landmark contributions 
of Verdii-Shamai [8] and Tse-Hanly [9] using large dimensional random matrix theory (RMT) to various 
problems in information theory. Since then, a large body of performance analyses of various MIMO chan- 
nels were obtained by large dimensional random matrix tools such as the Stieltjes transform method (or the 
Silverstein-Bai method) [10],-*^ the Gaussian tools (integration by part and the Poincare-Nash inequality) 
[11], the free probability [12], and the replica method [13]. See [14-16] for more details. 

For channel matrices with Gaussian entries, the replica method, an approach originally developed in 
statistical physics, serves as a powerful tool to derive the relevant results. For example, it has been used 
to obtain asymptotic mutual information results for Rayleigh [25] and Rician fading [26] channels with 
separately correlated antennas. Nevertheless, this method is mathematically incomplete, to say the least. 
To acquire a more sound mathematical procedure, advanced tools such as the Gaussian tools and the 
Stieltjes transform method are required. Using the Gaussian tools, the asymptotic mutual information 
expressions for Rayleigh and Rician fading channels have been confirmed rigorously by Hachem et al. [27] 
and Dumont et al. [28], respectively. Based on the Stieltjes transform method, Couillet et al. recently 

studied a MIMO multiple access channel (MAC) with separately correlated user channels [7]. In this 

1 1 

case, each user's channel matrix, H^, can be written in the form R|XfeT|, where Xj^ has independent 
and identically distributed (i.i.d.) zero-mean Gaussian entries, and and Tjt are both deterministic 
nonnegative definite matrices which, respectively, characterize the spatial correlation structure at the 
receiver and transmitter sides separately. 

Though strictly speaking, the large-system results are only asymptotically tight, they provide reliable 
performance predictions even for small system dimensions and at a much lower computational cost than 

Monte-Carlo simulations, as well as offer insightful understanding on communications channels. Moreover, 

'^In recent years, this method due to Silverstein and Bai has been developed into a much useful tool, widely known as the 
Stieltjes transform method in the spectral analysis of large dimensional random matrices. 
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Figure 1: A small-cell network. 



large-system results are also important for designing many practical wireless systems such as precoder 
design [7,17,18], optimal training length design [19,20], scheduling [21], and others [22,23]. For most 
contributions, the elements of the MIMO channel matrix are assumed to be multivariate Gaussian distri- 
butions; that is, the amplitudes of the channel fading coefficients are either Rayleigh or Rician distributed. 
Despite being the most popular models for small-scale amplitude fading, there are more and more results 
to suggest different models [3-5]. For example, [5] proposed that Nakagami-m distribution is best suited 
for modeling the small-scale amplitude fading in such as indoor residential/office, industrial environments, 
and suburban-like microcell environments. In addition, the log-normal distribution has recently been used 
to describe the small-scale amplitude fading in the IEEE 802.15.3a [4]. There is clearly an increasing 
demand to investigate channels with non-Gaussian fading and their performance. Whether the systems 
specifically designed for Gaussian scenarios can still work well in non-Gaussian environments is unknown, 
and the results available in the literature so far are too limited to answer this question [7, 29, 30]. 

To appreciate the objective of this paper, it is important to understand the limitations of the existing 
results for non-Gaussian channels. In [7], the results were only derived under the assumption that each 
transmitter-side correlation matrix, T^, is diagonal, although it was conjectured that the results might be 
valid even when is nonnegative definite. A channel model composed of a general variance profile and a 
deterministic line-of-sight (LOS) component was studied in [29] which partially generalized the results in 
[7]. However, as compared to [7], the matrices R^'s in [29] cannot be nonnegative definite. 
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This paper aims to extend previous large-system results to a more general class of random matrices 
with non-Gaussian entries. As in [7], we consider a K-usei MIMO MAC, in which each Hjt is spatially 

correlated separately at both sides. In our model, a deterministic LOS component Hfe is also considered. 

1 1 

More specifically, the concerned Kronecker channel R^X^T^ can be described as follows. The entries of X^. 
are i.i.d. complex centered random variables (noi necessarily Gaussian), Ty^'s are deterministic nonnegative 
definite matrices, and Rfe's are diagonal nonnegative matrices. This model arises in small-cell networks 
(SCNs) as shown in Figure 1. The SCNs, which are typically composed of densely deployed low-cost low- 
power base stations (BSs) , have attracted considerable attention for their potential to increase the capacity 
of cellular networks [6,7,20]. In these networks, the channel fading would tend to be non-Gaussian. In 
contrast to [29], our consideration allows user equipments (UEs) to be equipped with multiple spatially 
correlated antennas, which is a typical phenomenon due to space limitation of UEs. 

There are several obstacles when one intends to apply the Stieltjes transform method originally devel- 
oped for the case with diagonal (e.g., [7,10]) to that with general nonnegative definite [35]. To 
overcome the difficulties, using the generalized Lindeberg principle [38,39], we show that under very mild 
conditions, the Stieltjes transforms of the considered random matrices with Gaussian entries and that with 
non-Gaussian entries coincide in the large dimensional regime. This result enables us to derive the deter- 
ministic equivalents (e.g., the Stieltjes transform and the ergodic mutual information) for non-Gaussian 
MIMO channels from the known results for Gaussian MIMO channels. We therefore generalize the deter- 
ministic equivalents of previous results to the SCNs. For uncorrelated channel matrices with i.i.d. entries, 
such property is implicit in [36, Figure 4] from computer simulations and has recently been proved in [38, 
Corollary 2]. However, in our derivation, we prove that the deterministic equivalents of the MIMO MAC 
channel in [7] arc true even if the entries of X^ are non-Gaussian, and those Rfc and are deterministic 
nonnegative definite matrices.^ Therefore, we prove the conjecture made in [7] entirely. 

The remainder of this paper is structured as follows. In Section II, we introduce the channel model 

of the SCNs. Section III then presents our main results and outline their proofs whose details are given 

in the appendices. Some mathematical tools needed in proving the results are reviewed in Appendix D. 

Simulation results are provided in Section IV and finally we conclude the paper in Section V. 

Notations — Throughout this paper, the complex number field is denoted by C. For any matrix A G 
^Note that if the LOS is absent, we allow R^'s to be nonnegative definite. See Section III for detail. 
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C , Aij denotes the (i,j)th entry, while A , and A return the transpose and the conjugate transpose 
of A, respectively. For a square matrix B, B2, B~^, tr(B), and det(B) denote the principal square root, 
inverse, trace, determinant of B, respectively. Also, Iat is the N x N identity matrix, O^v denotes either 
the N X N zero matrix or a zero vector depending on the context, || • || represents the Euclidean norm of an 
input vector or the spectral norm of an input matrix, || • ||f denotes the Frobenius norm of a matrix, p(-) 
represents the spectral radius (i.e., the largest absolute value of the eigenvalues) of a matrix, E{-} returns 
the expectation of an input random entity, log(-) is the natural logarithm, and return the real 
part and the imaginary part of an input entity respectively, 1^ denotes the indicator function of the set A, 
and (g) is the Kronecker product [31]. We use C (or Cp, C, C", . . . ) to denote a universal constant whose 
value does not depend on matrix sizes but may vary from one appearance to another. Almost sure (a.s.) 
convergence is denoted by If {ai}i is a sequence of real numbers, then bi = 0{ai) and hi = o{ai) 

stands for \hi\ < C\ai\ and lim^ ^ — >• respectively. As usual, j = s/—!, M+ = {x G M : x > 0}, and 
M~ = {a; G M : a; < 0}. Also, C+ = {z = zi +\z2 G C : 22 > 0} and C = {z = z\ + jz2 G C : Z2 < 0}. 

II. Channel Model and Problem Formulation 

A. MuIti-cell MIMO-MAC with LOS and Spatial Correlation 

As shown in Figure 1, we consider a MIMO-MAC system with K UEs, labeled as UEi, . . . , UE^^-, which 
are equipped with ni, . . . ,nK antennas, respectively. The K UEs transmit to N interconnected small-cell 
single-antenna BSs simultaneously. In this paper, we use the Kronecker model to characterize the spatial 
correlation of the MIMO channel for each link so that the correlation properties at the BS and any UE are 
modeled separately, e.g., [32]. Specifically, user fe's channel, G C^^"'', can be written as 

Hfc = R|XfeT|+Hfe, (1) 

where = diag(rjfc_i, . . . , v^^n) is a deterministic diagonal matrix with the ith diagonal entry, r^^i, being 
the channel gain from UEfe to the ith receiving antenna (or the ith. BS), Tk G C"''^^"''^ is a deterministic 
nonnegative definite matrix, which expresses the correlation of the transmit signals across the antenna 
elements of UEfc, = [:^^^ij^^] £ C^^"-'' consists of the random components of the channel in which the 
elements {^ij^}i<i<N;i<j<nk i.i.d. complex random variables with zero mean and variance of Pk, and 
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Hfc G C^^"* is a deterministic matrix corresponding to the LOS of the channel. 

To get a proper definition on the signal-to- noise ratio (SNR), consider the power of the channel: 

E [tr (HifcHf )] = ^tr (R^) tr (T^) + tr (H^Sf ) . (2) 

It is customarily assumed that R^, T^, and are normalized such that tr (Rfc) = N, tr (T^) = n^, and 
tr (HfcH^) = N. In so doing, can be used as an indicator for the SNR of user k, which is independent 
from the matrix dimensions. For notational brevity, henceforth, we assume that = 1 V/c without loss of 
generality.^ 

A diagonal structure in Rjt is sufficient to model the SCN scenarios under investigation. However, the 
more general nonnegativc definite structure for R^ could extend our results to cope with more complex 
applications. Due to the presence of H^, unfortunately, the required analysis is incredibly arduous. As a 
result, if {Hfe 7^ 0}vfe, we restrict our consideration to diagonal Rfe's. Nevertheless, if {Hfe = 0}vfe, our 
results to be presented in Section III are valid even under nonnegative definite Rfe's. 

B. Mutual Information and Stieltjes Transform 

Mutual information measures the achievable rate of a channel and has been a key metric for performance 
analysis in wireless communications. The Stieltjes transform provides a convenient tool to study behavior 
of random matrices in large dimensional RMT. To do so, we first explain their relations. 

Defining H = [Hi • • • Hk], H = [Hi • • • H^] and n = X^j^i n^, the mutual information of the MIMO 
channel can be linked to the eigenvalues of a nonnegative definite matrix Bjv of the form 

Bat = S + HH^ " + ^ {^^ + ^'^) {^^ ^''^'^ ^ ' 

in which S accounts for a source of correlated interference whose covariance matrix has the nonnegative 
square root S2. Let Fbat be the empirical spectral distribution (ESD) of the eigenvalues of Bjv, given by 

-^B^(A) = {numbers of eigenvalues of Bjv < A} . (4) 



^For practical applications, one can set any positive value of Pk without incurring any issues in the results of this paper. 
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One of the main problems in large dimensional RMT is to study the limiting spectral distribution (LSD) 
of Bjv, denoted by F^. A convenient tool for this is the Stieltjes transform of Fbj^{X) which is defined as 

mu^ (z) ^ [ T^dF^^ (A) = ^tr {Bn - zIn)'^ for z G C - M+. (5) 
7r+ a — z iV 

We will denote S(M''") as the class of all Stieltjes transforms of finite positive measures carried by M"^. The 
Stieltjes transform provides a direct way to identify the LSD of large dimensional random matrices. Some 
useful properties of the Stieltjes transforms are listed in Lemma 17. According to [10,34], to show that 
the difference between Fb^ and F^ converges vaguely to zero, it is equivalent to show that 

mBj,{z)-mN{z) ^0 for zeC-R+, (6) 

where mN{z) = f^_^ j^dF^^X) is the Stieltjes transform of F/y. 

For wireless communications, the importance of the Stieltjes transform is due to the fact that many 
important performance metrics can be expressed as functions of the Stieltjes transform of Bjv- The mutual 
information can be expressed as functionals of the Stieltjes transform of Bjv through the so-called Shannon 
transform, where their relationship can be expressed as [14, Section 2.2.3] (or [29, page 891]) 



VB^(a2) = 1 logdet (in + ^HH^^ 



/ 

Jo 



log(l + ^A)dFB^(A) 



/;( 



a 

2 



rriBj^i-uj) duj for cr^ G M+, (7) 



where it is assumed that S = Oat for simplicity.^ Here, VBiv(c^) provides a performance metric regarding 
the number of bits per second per Hertz per antenna that can be transmitted reliably over the SCN with 
channel matrices {H-k\k=i,...,K- 

In this paper, we are particularly interested in understanding the Stieltjes transform as well as the 



''The generalization of the corresponding results to the case with S y^: is straightforward. In the case with S y^: 0, the 
related performance metric (or the mutual information) is given by 



logdet (^liv + S + ^HH^^ - -i logdet {In + S) . 
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Shannon transform of B at in the asymptotic regime where K is fixed and N,ni, . . . , hk ah grow to infinity 
with ratios {/3fc(iV) = ^}k=i,...,K such that 

Pmin < min lim inf l3k{N) < max hm sup Pk{N) < Pmax (8) 

k N k N 

and < /3min, /^max < oo. For convenience, we will refer to this asymptotic regime simply as A/* — >■ oo. Our 
main goal is to find a nonrandom matrix- valued function (to be determined later) such that 

mBjv(^) - ;^tr(*(z)) ^0 for z G C - M+. (9) 

This type of relation is referred to as deterministic equivalent [29], and jftr{^{z)) is said to be the deter- 
ministic equivalent to mB^(^)- We will apply (9) to find a deterministic equivalent of the ergodic mutual 
information E{Vbjv('^^)}) denoted by VAr((T^), and achieve this by proving E{Vbjv(^^)} ~ '^n{(^'^) — ^ 0. 
In general, the computation of E{Vbjv(c^)} relies on time-consuming Monte-Carlo computer simulations, 
while the deterministic equivalent is analytical and a lot easier to compute than E{Vbn{'^^)}- 

In wireless communications applications, the Stieltjes transform itself may also be used to characterize 
the asymptotic signal-to-interference plus noise ratio (SINR) of certain communication models, such as [9]. 
The above-mentioned illustrations are just a few of the several important applications of the Stieltjes trans- 
form in wireless communications. For a thorough survey of other applications, see [14, 15]. Undoubtedly, 
in order to construct reliable applications in wireless communications, new analytical results concerning 
the LSD as well as the Stieltjes transform in the asymptotic regime are required. 

III. IVLain Results 

Before we present our main results, we first state the assumptions imposed in our SCN model. 
A. Assumptions 

Assumption 1 Let = [-^X^^j^''] G C^^"*^, where X^j^ 's are i.i.d. complex random variables with 
independent real and imaginary parts such that E{x['l^} = and E — E{x|^''}p| = 1. 

Assumption 2 The family of deterministic matrices {R^, Tjt, S}vfe is deterministic nonnegative definite. 
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Assumption 3 The matrices Hk, Tj^, and Hk are normalized such that 



tr(Rfc) 



< 



tr(Tjfc) 



"it, 



(10) 



AT. 



Clearly, because of the normalization constraint in (10), the sequences {-FTi.}vfc are tight in while 

{-fRfelvfc cind {Fg H^ivfc are tight in N . It means that for each fixed e G (0, 1), we can always select an 



Assumption 4 The family of deterministic matrices {Rfc}vfc is diagonal with nonnegative elements. 

Notice that Assumption 4 requires to be diagonal which is more restrictive than Assumption 2. 
However, this assumption is still satisfied in the application of the SCNs under investigation. Also, it 
should be noted that Assumption 4 is not required for some theorems presented in this paper. 

B. Main Results 

We first introduce some properties of the deterministic matrix- valued function ^{z) which is needed in 
the deterministic equivalents of the Stieltjes transform and the ergodic mutual information. To facilitate 
our expressions, we define the notation (A)fc that returns the submatrix of A obtained by extracting the 
elements of the rows and columns with indices from + 1 to Yl\=i '^i- 

Theorem 1 Let Pk = Under Assumption 2, the deterministic system of the following K equations 



■fc 



a > such that for all n^, FtAo) > 1 — e, and for all iV, Fr, (a) > 1 — e and F-^^H{a) > 1 — e. 



1 



tr(R,*(z)) 



forl<i<K, 



(11a) 



N 





(lib) 
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where 



= [^(z)-^ - zH*(z)H^) , (12a) 
*(^) = [■^{z)-^ - zH^*(z)h) ~^ , (12b) 

*(^) = — ('--S + ^ei(^)Ri + l7V^ , (12c) 

*(z) = ^diag ((I„, + /3iei(z)Ti)-\ . . . , (I„^ + PKeK{z)TKr^) (12d) 

/laue a unique solution for z E C — M+. /n particular, ei{z) G S(M+) and ei{z) G S(M'^) /or z G {1, . . . , ii'}. 
Proof: See Appendix B. □ 



We next provide a deterministic equivalent for the Stieltjes transform of B^v- 

Theorem 2 In addition to Assumptions 1, 2, and 3, if one of the following conditions holds: 

1) K = l, 

2) ^i = with 1<K <oo, 

3) Assumption 4 with 1 < K < oo, 
then, as J\f oo, we have 

m^^{z)- ^tr{'^{z)) /orzGC-M+. (13) 

Proof: Section III-C is dedicated to the proof of Theorem 2. □ 

Remark 1 According to (1), we have addressed the non-central part of the channel through H,fc. Hence, 
we have E{X^} = in Assumption 1 for conceptual clarity. In fact, the assumption £.{x'^} = can be 

(k) 

removed from Theorem 2 if 's have the same mean. One can see that removing the same mean of the 
entries o/X^ does not affect the LSD of Fbj^{x). See Appendix A.l for detail. 

When K = 1, S = 0, and Hi = 0, Zhang's result in [35] allows Ri to be an arbitrary nonnegative 
definite matrix and Ti to be Hermitian. While K = 1 and Hi = 0, Pan in [37] tackled the cases where 
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Ti and S are arbitrary Hermitian matrices, but Ri = Iat. Using a new approach based on the generahzed 
Lindeberg principle [38], we can handle both the cases in [35, 37] and make the proofs simpler. As a result. 
Theorem 2 says that (13) holds when K = 1 without the diagonal restriction on Ri, Ti, and without the 
requirement of Hi = 0. This result also embraces the case in [29, Section 3.2] as a special case. 

For the general case with 1 < A' < oo but H = 0, it was pointed out by Couillet et al. in [7, Corollary 
1] that (13) holds when the entries of Xjt are i.i.d. Gaussian random variables. When the entries of are 
non-Gaussain, (13) was derived in [7, Theorem 1] under the assumption that T^'s are diagonal. Therefore, 
Theorem 2 is more general than these previous studies in the sense that the entries of are not necessarily 
Gaussian and is not necessarily diagonal, as conjectured in [7]. 

When 1 < iiT < GO and the LOS component H is present, we require R^'s to be diagonal due to 
mathematical difficulties. However, in this case it is worth pointing out that (13) is also true if the 
matrices Ri, . . . are simultaneously unitary diagonalizable according to our argument. This type of 
channel model is the so-called "virtual channel representation" in [41] and is found to be useful for modeling 
channels with many antennas [42]. 

As an application, we next use Theorem 2 to provide a deterministic equivalent of the ergodic mutual 
information in the following theorem. 

Theorem 3 Assume that Bjv follows the hypotheses of Theorem 2 and S = 0. Then, as J\f — >■ oo, the 
Shannon transform ofB^ satisfies 




(14) 



where 



Vn{<J^) = ^logdet 




1 



+ H*(-a2)H^ +-logdet 



( 




■^J2ei{-a'M-a'). (15) 



i=l 



Proof: (15) is an explicit expression of ( 



N 



tr(*(— w))) doj. The proofs of the convergence 



and the explicit expression are given in Appendix C. 



□ 
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For the case with K = 1 and H 7^ and the general case with 1 < K < co but H = 0, VAr(cr^) 
agrees perfectly with those in [28, Theorem 1] and [7, Theorem 2], respectively. Nevertheless, Theorem 3 is 
more general than [28, Theorem 1] and [7, Theorem 2] in the sense that there is no Gaussian distribution 
requirement on the entries of Xj.. Note that in the above two cases. Theorem 3 allows R^'s and T^'s to 
be generally nonnegative definite. Further, for the general case with 1 < < 00 and H 7^ 0, Theorem 3 
contains [29, Theorem 4.1] as a special case even though Theorem 3 requires Rfe's to be diagonal, whereas 
in [29, Theorem 4.1], both R^'s and T^'s are restricted to be diagonal. Finally, unlike several of other 
contributions (e.g., [28, Theorem 1] and [29, Theorem 4.1]), where Rfe's, T^'s, and H^'s are required to 
have uniformly bounded spectral norms. Theorem 3 is valid for the more general trace constraints (10). 
This relaxation makes Theorem 3 valid for all possible correlation patterns and LOS components. 

As the large-system results are invariant to the type of fading distribution, any designs based on the 
large-system results are robust, and the properties of the asymptotic optimal input covariance are invariant 
to the type of fading distribution. Specifically, by [7, Proposition 3], we conclude that if H = 0, even when 
the entries of are non-Gaussian, the eigenvectors of the asymptotic optimal input covariance matrix 
align with that of Tj^ while the eigenvalues follows a water-filling principle. In [7, 28] , an iterative water- 
filling algorithm based on VAr(cr^) is provided to obtain the asymptotic optimal input covariance. The 
iterative algorithm turns out to have wide applicability to all types of fading distribution. 

Unlike [7, Theorem 2], Theorem 3 does not assert that Vbjv(<7^) ~ V'Ar(a^) 0. Although (14) has 
already satisfied our applications of interest, we find it important to clarify some properties regarding the 
a.s. convergence. Indeed, following [7, Theorem 2] and using Theorem 2, (14) can be strengthened to 
a.s. convergence under an additional assumption stated in the following theorem. 

Theorem 4 In addition to the assumptions of Theorem 3, suppose further that 

1) E{|xi?|^}<oo; 

2) There exists an a and a sequence tn such that for all N, 

max max{Ar;v+i(R'fc), Ar;v+i(Tfe),Ar^+i(HfeHf )} < a, 

k 

where Xi{A) denotes the ith largest eigenvalue of a matrix A. 
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3) Let denote an upper-hound on the spectral norm o/ {Tfc, Rfe, HfcH^}i<fe</i:, and c > a constant 
such that c > ^^'^'^^ (l -\- ^/^min)^; ojv = cbj^ satisfies 

TAT log =0(iV). 

Then, (I4) can be strengthened as 

VB^(a2)-Viv(a2)i^0. (16) 

Remark 2 Indeed, (16) holds if assumptions 2) and 3) of Theorem 4 are replaced by the assumptions that 
;^tr(R^), ^tr(T^), and -^tr ((Hj^Hj^)^) are bounded for all k. A proof is given in Appendix C.2. 

Note that neither of the assumptions in Theorem 4 and Remark 2 imphes the other. It was pointed out 
in [7] that most conventional models for R^'s and T^'s satisfy the assumptions of Theorem 4. However, 
the assumptions in Remark 2 assist in covering some cases that are not met by 2) and 3) of Theorem 4.^ 

C. Proof of Theorem 2 

This subsection gives an outline of the proof of Theorem 2 for ease of understanding. 

As in [33, Section 4.5.1], we argue that the entries of can be replaced by random variables bounded 
in absolute value by Sn^/n^ without changing the LSD of F-q^^. Here e„ is a positive sequence converging 
to zero. Also, following [33, Section 4.3.1] we apply a truncation on Rfc,Tfe,HjkHj^ such that the spectral 
norms of Rjt, T^, and HfeH|'^ are bounded by a constant, say a (see details in Appendix A.l). As a 
consequence, the proof of Theorem 2 may be achieved under the following additional conditions. 

Assumption 5 For each of N, Xy' are i.i.d., and 

E {xS^} = 0, E = 1, < (17a) 

max max{||Rjk||,||Tfe||,||HfeH^||}<a. (17b) 

k=l,...,K 

For convenience, we still use X^, Rjt, T^, and Hfe to denote those truncated and centralized matrices. 

^As an example, suppose that the eigenvalues of are given as: one being iVJ, j^jjy of them being (logiV) J, and the 
remaining eigenvalues being bounded. In this case, tn = 13— jv ^rid ajv = so rjv log(l + ^) = 0{N) rather than 

o{N). Therefore, assumptions 2) and 3) of Theorem 4 are not satisfied. However, ^tr(R^) is bounded. 
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Write 



mB^(^)- ^tr(*(^)) 

= {rriBr, (z) - E{m-B^ (z)}) + (E{mB^ (z)} - E{mB^ (z)}) + (E{mB^ {z)} - ^tr , (18) 

where Bn is obtained from Bjv defined in (3) with ah the entries X^j^^s of X/j replaced by independent 
standard Gaussian random variables X^^^'s. Here X^'^^^s are independent of -^/i^^'s. 
The proof of Theorem 2 then consists of the following three steps: 

Step 1. By a martingale approach we first prove that 

mnAz)-^{mBAz)}^0. (19) 

Step 2. By the Lindeberg principle (Lemma 1 below) [38, Theorem 2] we claim that 

Hm^Az)} - HrnsAz)} ^ 0. (20) 
Step 3. The last step is to investigate the Stieltjes transform of Bn so that the following is true: 

E{mB^(^)}-^tr*(z)^0. (21) 



Not that Step 1 and Step 2 are completed under Assumptions 1-3 in which R^'s and T^'s are generally 
nonnegative definite and H 7^ 0. Appendix A. 2 handles Step 1 while Step 2 is addressed in Appendix A. 3, 

where we mainly make use of the generalized Lindeberg principle given below. 

Lemma 1 (Generalized Lindeberg Principle [38]) Let v = [vi] G M" and v = [vi] G M" be two random 
vectors with mutually independent components. Define {ai}i<i<n and {bi}i<i<n with 

ai = \E{vi}-E{vi}\, and bi = \E{v^}-E{v^}\. (22) 
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Then, given a thrice continuously differentiahle function / : M" — )• M, we have 

n 

|E{/(v)}-E{/(v)}|<^ 



n r 

a, E{|O,/(vr\0, v?+i)|} + -b, E{\dffiY{-\0,v2^,)\} 



, (23) 



where df is the p-fold derivative in the ith coordinate, y\ = {vi, . . . , Vi-i), and ^f^i = {vi+i, . . . , Vn)- 

Here, it should be noted that both Lindeberg's principle (Chatterjee [39], Korada and Montanari [38]) 
and the interpolation trick (see Lytova and Pastur in [40] and Pan [37]) can be used to handle Step 2. 
However, Lindeberg's principle is simpler than the interpolation trick when proving this type of problems. 

Due to Lemma 1, the remaining task is to consider Bn with underlying random variables being a 
standard Gaussian distribution. In the remainder of this subsection, we will show how Step 3 can be done 
case by case from the known results of Gaussian matrices. 

Before proceeding, we recall some useful results. Denote the spectral decomposition of by U^D^ Uj^, 

where Ufc G C"'=^"'= is unitary and is diagonal. Since is Gaussian, the joint distribution of Xf^XJ^ 

1 1 1 1 _ 

is the same as that of Af^. Thus, (R'^X^T^ + Hfc)(R| Af^T^ + H^)^ has the same distribution as 

1 _ 1 - \ / 1 ,1 ^ ^ 



XkDl + Hfe R| XkDl + Hfc , (24) 



where = HfeUfc. Hence, we will assume that the channel is in the form of R| ATfeD^ +Hjt in the sequel. 

Consider condition 1) of Theorem 2 (i.e., the case K = 1) first. Denote the spectral decomposition 

1 1 

of R^^ by UiD^^u^^. For simphcity, we assume S = 0.^ The joint distribution of X^ is the same 

1 _i _ 1 ,1 _ 
as that of X^. Therefore, the distribution of (R^ ATiD^ + Hi)(R^ ATiD^ + Hi)-^ is the same as that of 



^The extension to the case with S ^ is straightforward. 
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1 1 



(UiD^AfiDf +Hi)(UiD2AriDi^ +Hi)^. Since 

tr ^(^(UiDf ATiDf +fii)(UiDfAriDf +Si)^-zljv) ^ 

= tr|^(^(DfAfiDf +fii)(DfAfiDf +Si)^-zI^j j (25) 

with Hi = Uf^HiUi, it suffices to prove (21) with Bn as foUows 

Bn = (nfXitif + Hi] fof ATiDf + Hi] . (26) 



However, the convergence of (21) with Bn in (26) was reported in [29, Section 3.2]. Indeed, in [29], X^j^'s 
are i.i.d. complex random variables with finite fourth moment and the variance profile is separable. For our 



1 _ 1 



interest, '^ip^s are standard Gaussian random variables and the variance profile of the channel D^AfiD^ 
is separable. Hence, Step 3 is completed under condition 1) of Theorem 2. 

Next, we turn to condition 2) of Theorem 2 (i.e., the case H = 0). In view of (24), it is enough to 
prove (21) with Bjv as 

BN = S + Y,^l^kt)kXkRl (27) 
fc=i 

The convergence of (21) with Bat in (27) follows immediately from [7, Corollary 1]. Therefore, Step 3 is 
finished under condition 2) of Theorem 2. 

Finally, we consider condition 3) of Theorem 2 (i.e., Rfe's being diagonal). Note that Rfe's are diagonal 
nonnegative matrices in the remainder of this subsection. From (24), it suffices to prove (21) with Bat as 



Bat = S + ^ (^R| Xki>l + Sfc) (^1 Xki»l + fife) ■ (28) 
The following theorem contributes to Step 3 in this case. 

1 _ 1 

Theorem 5 Consider the channel matrix of the form Hjt = R^^ X^Y)^ + for k = 1, . . . , K , where Rfe 's 
and Dfe 's are diagonal nonnegative matrices. Assume that the spectral norms of Rfe 's, 's, and Hfe 's 



(k) ~ 

are all bounded and that X^- 's are i.i.d. standard Gaussian random variables. Let H = 



Hi H 



K 
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A/" — )■ oo, then for any z G C 



lE{tr(HH^-zl)-^} 
lE{tr(H^H-.l)-^} 



1 

TV 
1 

N 



tr(S(z)) 0, 
tr(S(z)) 0, 



(29a) 
(29b) 



where 



S{z) = (@{z)-^ - zfl0(2)fi^) ^ , 
H(^) = (@{z)-^ - zH^0(z)h)~^ , 

©(2;) = ^diag (in, + n:[hr (RiS(z)) Di) ~\ . . . , + n-^hr (Rif H(z)) D;^) . 



(30a) 
(30b) 

(30c) 

(30d) 



Proof: Note that H 



+ H corresponds to the channel in [29] with a 



general variance profile. Therefore, this theorem can be obtained immediately from [29, Theorems 2.4 and 
2.5] and the dominated convergence theorem.^ □ 
The proof of (21) under condition 3) of Theorem 2 is a result of Theorem 5. The idea is to cast the 
model HH^ + S into an extended model such that it fits into the framework of (29). To this end, write 

K+l 

HH^ + S = ^ HfcHjf , 

k=l 

- - - = 1 
where = A'^D^ + is given in (28) and Hr+i = 82 (without a random component). Plugging 

this model into (30a) and (30b), we obtain 



H(z) = (@{zy^ - zH©(z)H^ + S) \ 



S{z) 



0(z)-l 













— z 


si 





-zIn 





-1 



0(z) 



H S2 



(31) 
(32) 



^The dominated convergence theorem is due to the expectation involved in (29). 
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Note that S{z) is now a matrix of size (n + N) x (n + A^). From (30d), write 



K 

&{z)-' + S = - ^ -tr (l)i{S{z))i) IU-zIn + S. 



(33) 



1=1 



Applying Lemma 12 in Appendix D, we can obtain the n x n principal submatrix of S{z) as 



[S{z)]i.,n,l:r. 



@{z)-' - zu^e{z)u - z^we{z)s 



zIn — z, 



S50(z)S5) ^S5e(z)H 



&{zy^ - zH" (^&{z) + z@{z)S^ (^-zIn - zS^&{z)S^^ S^@{z)j H 
&{z)-^ - zH" {&{z)-^ + S)"^ H 



(34) 



where the third equality is due to the matrix inverse lemma (Lemma 12 in Appendix D). Plugging (33) 
and (34) into (30a)-(30d) and recovering the effect from the eigenvectors U^'s, we obtain the formu- 
las (12a)-(12d). In particular, defining U = diag(Ui, . . . , Ux), we have ^{z) = @{z)~^ + S, #(z) = 
U^[S(2;)]i:n,i:nU, ®{z) = ^{z), and @{z) = U^*(z)U. By (29a), we immediately establish (21). 

For the general case with nontrivial H, R^'s are required to be diagonal so that [29, Theorem 2.4] can 
be used immediately to yield Theorem 5. If Rfc's are generally nonnegative definite, one may wonder if 
the same Stieltjes transform method in [29] can still be used to get the similar result. At present, due to 
mathematical difficulties, this is still an open challenge and such development is ongoing. 



IV. Simulation Results 

In this section, computer simulations are provided to evaluate the reliability of the asymptotic result par- 
ticularly when the channel entries are non-Gaussian. Specifically, we compare the analytical result VAr((J^) 
(15) with the Monte-Carlo simulation results of the ergodic mutual information E{Vbjv(c^)} obtained from 

averaging over a large number of independent realizations of H. 

1 1 _ 

Given the Kronecker MIMO channel model Hfc = R^XfcT| -|-Hfc for UE^, the simulation settings used 
in this study are based on the following assumptions. First, the spatial correlation is generated from a 
uniform linear array with half wavelength spacing in a wireless scenario. The propagation path cluster is 
assumed to have a Gaussian power azimuthal distribution, which is characterized by the mean angle and 
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the root-mean-square spread [25]. Second, the channel gain from UE^ to each receiving antenna Rfc and 
its LOS components Hfe are generated randomly. Third, the i.i.d. entries of X^'s are assumed to be of 
the form = exp(_7^j^^^) [43], where 0^^^ is the phase modeled as a uniform distribution 

over [0, 27r], and is the random amplitude drawn from a distribution with normalized mean power, 

i.e., E{[VF^^^^]^} = 1. The typical probability distributions for modeling the amplitude behavior include 
the Rayleigh, Nakagami, and log-normal distributions [4,5]. Among them, the Nakagami distribution is 
arguably the most general model that embraces the Rayleigh distribution and those having longer tails. 
On the other hand, the log-normal distribution is well known to be a suitable model for slowly varying 
communication channels, e.g., indoor radio propagation environments. 

To measure the fading severity of the channel model, we adopt the coefficient of variance (CV) as a 
performance metric, which is defined by [43] 



with var{VF|]^'^} being the variance of W^-^' . According to [43], the variation in ergodic mutual information 
can be significant if the values of CV are different. Note that the CV for Rayleigh fading channels is 0.526 
and any CV value much greater than this reference point indicates a severe level of fading. For Nakagami 
fading, fading is severe if the Nakagami m-factor is very small. However, the m-factor is greater than 0.5 
[44], which gives a possible range for the CV values only in [0,0.7555]. Therefore, we use the log-normal 
distribution to generate a fading channel with very severe fading by setting a large value for CV. 

Under a different fading severity. Figures 2 and 3 show the results of E{Vbjv(o"^)} and VAr((T^) for the 
cases with H = and H 7^ respectively. As we can see, when the number of antennas grows large (e.g., 
N = n\=n2 = 16) all curves almost overlap regardless of the distributions or the CV values. The ergodic 
mutual information is more sensitive to the type of distribution as well as the CV value for the scenarios 
with small number of antennas. Thus, this invariance phenomenon of the ergodic mutual information in 
the large-system limit agrees with our analysis. Also, one can observe that the case H / exhibits less 
sensitivity to the type of distribution, even for a small number of antennas because half of the energy has 
contributed to the LOS components which has nothing to do with mitigating the fading distributions. 

Next, we evaluate the variance of Vb^vI*^^) by numerical simulations. Using the same parameters as 



CV = 




(35) 
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Asymptotic analysis 

Monte- Carlo simulations (Gaussian, CV=0.526) 
Monte- Carlo simulations (Nakagami, m=V2, CV=0.7555) 
Monte- Carlo simulations (Lognormal, CV=0.7555) 
Monte- Carlo simulations (Lognormal, CV=0. 526x3) 




Asymptotic analysis 

Monte- Carlo simulations (Gaussian, CV=0.526) 
Monte- Carlo simulations (Nakagami, m=V2, CV=0.7555) 
Monte- Carlo simulations (Lognormal, CV=0.7555) 
Monte- Carlo simulations (Lognormal, CV=0. 526x3) 




(a) N — ui — n2 — 2 



1/^2 (dB) 

{b) TV = m = n2 = 16 



Figure 2: Ergodic mutual information versus SNRs for the SCNs with H = and a) = ni = n2 = 2 and 
b) iV = ni = n2 = 16. The solid hnes plot the analytical results, while the markers plot the exact results. 



Asymptotic analysis 

Monte- Carlo simulations (Gaussian, CV=0.526) 
Monte- Carlo simulations (Nakagami, m=V2, CV=0.7555) 
Monte- Carlo simulations (Lognormal, CV=0.7555) 
Monte- Carlo simulations (Lognormal, CV=0. 526x3) 




Asymptotic analysis 

Monte- Carlo simulations (Gaussian, CV=0.526) 
Monte- Carlo simulations (Nakagami, m=V2, CV=0.7555) 
Monte- Carlo simulations (Lognormal, CV=0.7555) 
Monte- Carlo simulations (Lognormal, CV=0. 526x3) 




l/a^ (dB) 

(b) N — ni — 712 



16 



Figure 3: Ergodic mutual information versus SNRs for the SCNs with H 7^ and a) = ni = 712 = 2 and 
b) = ni = n2 = 16. The solid lines plot the analytical results, while the markers plot the exact results. 
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3.5 r 



Lognormal 




Figure 4: The empirical variance versus CV when H = and = 30 (dB). The above plot corresponds 
to a small system while the below plots correspond to the large system of interest. 



those in Figure 2, Figure 4 shows the empirical variances against CV when l/u^ = 30 (dB). We observe 

that as the number of receive antennas grows large, the variance of VB^i.^"^) becomes small, or the mutual 

information approaches to a deterministic value in the large-system limit. The scenario with = 16 

and ni = n2 = 2 particularly corresponds to typical SCNs, where the transmitter has a small number of 

antennas while the receiver is composed of large number of antennas. This validates the practice of the 

deterministic approximation in the SCNs. 

The CLT of Vbjv(<7^) has been recognized for different models by Moustakas et al. [25], Taricco [26], 

and Hachem et al. [45,46]. Although the CLT is beyond the scope of this paper, we find it important to 

clarify some properties of the variance of Vbjv('^^)- the large system of interest (e.g., N = ni = n2 = 16 

or N = 16, ni = n2 = 2), it is noted that the log- normal distribution undergoes the highest variance. In 

addition, the curves of variance diverge as the CV value increases. Clearly, the CV does not provide a 

proper metrology neither for the mean nor the variance of Vbjv (<^^) the large system limit.^ In this paper, 

we have shown that VAr(cr^) depends on the second moment of the variables X^j^^s. As a consequence, the 

mean of the mutual information is invariant to the type of fading distribution in the large system limit. 
®The insightful finding is due to A. Moustakas. 
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Under a simpler model (where the correlation matrices are diagonal), it has been pointed out recently in 

[46] that the variance of mutual information depends not only on the second moment but also on the fourth 

(k) 

moment of the variables X^^^s. This conjecture might be true in the SCNs of interest but at present, the 
required CLT to address the cases where the correlation matrices are generally nonnegative definite and 
the channel entries are non-Gaussian is not at all understood. 

Large-system results have been widely used to design the optimal input covariance [7, 17, 18, 25]. With 
Gaussian channel entries, [18] showed that the input covariance design based on the large-system results can 
provide indistinguishable results to that achieved by stochastic programming (or the Vu-Paulraj algorithm 
[47]), even for the cases with a small number of antennas. It is important to know if such good characteristics 
still holds when the channel entries are non-Gaussian. We clarify this property over the case with H = 0. 
In this case, the ergodic mutual information is more sensitive to the type of distribution and an iterative 
water-filling algorithm based on the large-system results can be used to obtain the asymptotic optimal input 
covariance [7, Table 11].^ For stochastic optimization, the Vu-Paulraj algorithm based on the barrier method 
is used, in which the average mutual information and their first and second derivatives are calculated by 
Monte-Carlo methods with 10^ trials. The number of iterations for the barrier method is set to 10. In 
Figure 5, we evaluate E{Vb;v('^^)} when the input covariance matrices are obtained using the large-system 
results and the stochastic optimization. As shown, the asymptotic approach provides indistinguishable 
results to that achieved by stochastic programming in the case of non-Gaussian fading channels. 

V. Conclusion 

This paper provided the deterministic equivalent of the LSD to deal with the channel matrices of SCNs 
where the entries of the MIMO channel matrix are no longer limited to be Gaussian distributed. Also, the 
correlation effects (caused by insufficient antenna spacing) and the LOS components (due to low antenna 
heights) are included in the analysis. Using the deterministic equivalent of the LSD, we analyzed the 
Shannon transform of this class of large dimensional random matrices and showed that the ergodic mutual 
information of the random matrices under investigation is invariant with respect to their distributions. As 
a byproduct, we proved that the deterministic equivalents of the MIMO MAC in [7] are true even if the 

entries of the channel matrix are non-Gaussian and R^'s and Tj^'s are nonnegative definite. 
®If H y^: 0, a similar iterative algorithm based on the large-system result was provided in [48]. 
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12r 




1/0-2 (dB) 

Figure 5: Achievable rates versus SNR with N = ni = n2 = 2. The (red) hues plot the results based on 
the large-system results, and the marker points plot the results for stochastic optimization. 

Appendix A. Proof of Theorem 2 

A.l Truncation, Centralization, and Rescaling 

We begin the proof of Theorem 2 by replacing the entries of {^k}i<k<K and that of the spectral decom- 
positions of {Rfc, Tjt, Hfc}i<fc<;^ with truncated (and centralized) variables. It suffices to prove that the 
difference between the ESD of B^r and the one of truncated B^v converges to zero with probability one 
because such convergence is equivalent to the convergence of their Stieltjes transforms. 

We first follow a line similar to that in [33, Section 4.3] to truncate the spectral decompositions of 
{Rfc, Tfe, HfcH|^}i<fc</4'. For any nonnegative definitive matrix A € C"*^™, introduce its spectral decom- 
position and the corresponding truncation as follows: 

A = U^diag (Ai(A), . . . , A^(A)) U^, 

^ A" = UAdiag (Ai(Tfc)l(Ai(A)<a), • • • , Am(A)l(A„(A)<a)) , 

where Aj(A) denotes the ith largest eigenvalues of A and q > 0. Also, for the rectangular matrix H^, 
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define its singular value decomposition and the corresponding truncation version as follows: 



(37) 



where is obtained from Sjt with each singular value x being replaced by X^{x<a)- Let Xii^k) denote 
the ith largest singular value of Hfe. By Lemma 6 and iv) of Lemma 4, we have 



sup 

X 



F /_ 1 1 1 1 1 \n{x) 



E (^^"l^ (Sfe - H^) + rank (r| - (r|)") + rank (t| - (T|)")) 

^ K / N N nu \ 

= ]7 E E l(x^{H.)>a) + E VA,(Rf )>«) + E VA.(T^/^)>a) 
k=l \i=l i=l ^ ' i=\ ^ ' / 

= E (%,Hf +^R.((«'>oo)) + ;^^T,((a^oo))^ . 



(38) 



The right-hand side of the inequality above can be made arbitrary small if a is large enough by Assumption 
3. Therefore we can assume that the eigenvalues of {Rfe, T^, HfcH-^}i<fe<x are bounded by a constant a. 

Next, we truncate and centralize the entries of Xj^. As pointed out at Remark 1, the assumption 
E{X^^^} = can be removed from Theorem 2 if xf^^^ have the same mean. For this reason, we do not 
make the zero mean assumption in the subsequent analysis. For each A;, let 



X 



(k) 
11 



11 "(ix'f |<£„Vn^) 



xi1^=xi1^-E{Xll>}, 



(39) 



where 



SniQ, and £„2E||xg)|2i^|^(,)|^^^^^}^0. 



(40) 



Also, define = [-^^X^i-^] G C^^"'^ and X^ = [^^X^i-^] e C^^"S and Bjv and Bjv (obtained from 
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(k) ^ (k) ~ (k) 

B jv with X^j replaced by X-^ and X^ , respectively) . By Lemma 6 and iv) of Lemma 4, we obtain 



sup 

X 



1 ^ 1 ^ 



k=l ij 



k=l 



^ 0, (41) 



where the last step can be obtained in the same way as in [33, Section 4.3.2]. Repeating the first inequality 
in (41) with Fb^^ (^) replaced by (x) yields^° 



sup 

X 



Similarly, we may show that re-normalization of X^j'' does not affect the LSD of {x) as in [33, Section 
3.2]. 

Therefore, henceforth, wc consider that Assumption 5 holds. For ease of reading, we recall this as- 

.(fc) 



0. 



(42) 



sumption here: For each of A'', X^j' are i.i.d., and 



{xif}=0, E{|xg)|2} = l, \xll^\<enV^k, 



max max{||Rfc||, ||Tfc||, ||HfcH^ ||} < a. 

k=l,...,K 



(43a) 
(43b) 



For convenience, we still use X^, T^, Rfe, and to denote those truncated and centralized matrices. 



A.2 Proof of Step 1 



The aim in this subsection is to prove that 



{|mB,(^)-E{mB,(z)}|2f} = o(^i^^ for any p>2, 



(44) 



which, together with Borel-Cantelli's lemma, ensures Step 1. For ease of explanation, we prove the case 
with K = 1 only but the similar procedure can be easily extended to the case with K > 1. For this reason, 
we omit the index k in the following procedure. 

Let Xj denote the jth column of X, be the column vector with the jth element being 1 and otherwise 



"Note that rank(Xfc - X^) = rank(E{Xfc}) < 1. 
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0, and set 

X(,.)=X-x,eJ. (45) 



Furthermore, we find it useful to define 



m^A^) = -^tr {Bm - zIn)-^ , (46) 



(z) = ltr(B(,)-zI^)-\ (47) 



wliere B(j) = (R2X(j)T2 + H)(R2X(j)T2 + H)-^^ + S. Also, wc use Ej to denote conditional expectation 
given Xj+i, . . . ,x„, so that Eo{mB^(2;)} = iriBNiz) and Ejj{mB^(^)} = E{mBr^{z)}. Therefore, we have 

n 

mBj,{z) - E{mBj,iz)} = ^[Ej_i{mB^(2;)} - Ej{mB^(2;)}] 

n 

= 5^[E,_i - E,]{tr ((B^ - zIn)-') - tr ((B(,.) - ^I^v)-')} 



1 

iV 



1 

M Ht^i " ^J-iH7ii + 7i2 + 7i3 + 7i4 + 7i5}, (48) 



where 



7ji = -ejTXg)Ri(B(j) - zI)-\Bn - ziy'li^^j, (49a) 

7,-2 = -xf (B^r - ^I)"'(B(,-) - zI)-iR^X(,)Te,-, (49b) 

7^-3 = -ejTe^xf R5(B(j) - 2l)-\Bjv - 2l)"^R5xj, (49c) 

7j4 = -ejTiH^(B(j-) - zI)-\Bjv - zI)~^R^x^-, (49d) 

7j5 = -xf Ri(B^ - zI)-\B^j) - ziy'HThj. (49e) 



In (48), we have used the resolvent identity (see Lemma 3), (45) and 



Bn - B(j) = Raxj-ej TXg)R2 + R2X(j)Tejxf R2 



+ R^XjejTejxf R3 + R5xjejT^H^ + HTie^xf R^ . (50) 



Since the mathematical treatments for 7^-4 and 7^5 are similar, we here consider 7^-5 only. Starting from 
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the Cauchy-Schwartz inequality and then applying 3) of Lemma 2, we get 



< ||R^||||(Bjv-^I)~i||(B(^-) -2l)~i||H||||T^||||xj-||. 



(51) 



Lemma 10 gives 



E{||xjfP} = 0(1) foranyp>2. 



(52) 



Note that ||R||, ||T||, ||H||, ||(Biv - zI)-^\\ and ||(B(j) - ziy^W are all bounded. Hence, we have 



E{|7j5r^} = 0(1) for any p> 2. 



(53) 



Then, by Lemma 9, we can show that for any p > 2, 



1 " 

-^[E,-E,_i]{7,5} 



2p> 



< 



EM') }s^tMM*} = o(j^). (54) 



where the second inequality follows from Lemma 7 and the last equality is due to (53). 

Next, we consider 7^-3. Let dist(2;,M+) stand for the Euclidean distance between z and M+. Since 



Bat — zl) ^11 and IKB^^) — zt) ^\\ are both bounded by 



dist(2,M+) 



, using Lemma 2 gives 



bj3\ < \ejTej\ ||R^(B(^-) - zI)-\Bn - zI)-^R^ || ||xjf < 



|T|| IIRII 



dist(z,M+) 



2 ii-^jl 



(55) 



In addition, a simple application of Lemma 10 gives E{|7j3|*'} = 0(1) for any p > 2. Then, applying the 
same arguments as in (54), we have that for any p>2, 



1 " 

-^[E,-E,_i]{7,3} 



(56) 



As the procedures for 7^1 and 7^2 are similar, we take 7^1 as an example. Using Lemma 2, we have 



|7,i| < l|ejTXg)|| ||R5(B(,) - zI)-\B - ziy^K^ ||||x,-|| < J'^l , ||ejTXg)|| ||x,-| 



dist(2;,M+)' 



(57) 
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Let Xj be the ith column vector of X^^ . It is easily verified that 



E|xfTe,e^Txi| < -e^T^e,- < -||T||2 < -a'^ 



(58) 



Then, 



E{||eTTX^^lPH^=^E 



(&) 



^xfTe.ejTx, 

n 

^ (xf Te.ejTx, - E {xf Te.ejTx,} ) 



i=l 



^E{xfTe,ejTx.} 



i=l 



<C7p^ E {|xf Te.ejTx, - E {xf Te.ejTx,} 



i=l 



+ ^p{fl^[ |xf Te.ejTx, - E {xf Te.ejTx,} |'| j + C^a^P, 



(59) 



where (a) is due to the fact that X^^X(j) = X]"=iXjX^, and (b) follows from Lemma 8 and (58). Prom 
Lemma 7 and (58), we get 



E{|xf Te.ejTx, - E {xf Te.ejTx.} [} < 2^-' (^E { |xf Te,ejTx,[} + -^^a^pj 



(60) 



Substituting this into (59), we obtain 



1=1 \«=i 



xfTe.-ejTxi 



+ C. (61) 



Moreover, since we know that 



Lemma 10 gives 



E{|x|^Te,|2^'}=o(^l) foranyp>l. 
From this and (61), it follows that 

E{l|ejTXg)||2-} = 0(l). 



(62) 



(63) 



(64) 
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By applying the independence between X(j) and Xj and using (52), (57), and (64), we have 



E{l7,iPn<T 



dist(z,M+)^^ 



{H^^)f'} E{||x,||^n = 0(l). 



Therefore, using Lemma 9 with the above, we have, for any p > 1, 



E < 



1 " 

-5;[E,-E,_i]{7,i} 



2p - 



< 



Cp 



E < 




(44) then follows from (54), (56), and (66). The proof is complete. 



O 



NP 



(65) 



(66) 



A.3 Proof of Step 2 



To begin with, recall the definition: 



Bat = S + (|r^XT^ +h) (|r^XT^ +h)^, 



(67) 

(68) 



where X and X are matrices with entries satisfying (43a) but X is Gaussian. The aim here is to prove 



|E{mB,(z)}-E{mB,(z)}|=0(£„), 



(69) 



As before, we will prove the case K = 1 only and drop the unnecessary index A; in the sequel. 

The strategy is to use Lemma 1, the Lindeberg principle [38, Theorem 2]. As pointed out at the end 
of the first paragraph of Appendix A, E{Xij} = E{JYij} = 0. Also we have E{|Xjjp} = E{|^jjp} = 1. 
Therefore Oj = 6j = in (22). We next evaluate the second and third lines of (23). To achieve this, 
we need to take the derivatives with respect to the real and imaginary parts of the entries of X, 

respectively. Because the real and imaginary parts of Xij are independent, all the results established in 
the real case can be directly applied for the complex case. Thus, without loss of generality, we deal with 
X and X with real entries only in order to present the formulas in a compact and succinct way. 
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For ease of exposition, we define 



1 



/(A) ^ -^tr (s+ (^RiAxi + h) (^R^ AT^ + h)^ - zI 



(70) 



where A is any matrix such that the product R2AT2 exists. As such, we have m-Bjs,{z) = /(X) and 
miSfi{z) = f{X). Moreover, to apply (23), A will take the form A(r, c, s) = [Aij{r,c,s)\ G C^^" with 



■^ijir^ C, s) 



if i < r, or z = r and j < c, 
s, if i = r, and j = c, 
, otherwise. 



(71) 



Further, let G = S + (^R^ AT^ + (^rI AT^ + , denote the partial derivative with respect to Aij 
by dij, and let Ejj be the matrix with a 1 in the (i, j)th position and O's elsewhere. To get the third-fold 
derivative of /(A), we rely on the following differentiation formulas: 

dij{G - zl)-^ = -(G - zl)-^{dijG){G - zl)'^, (72a) 

dijG = (R^/^EijT^) (|r^ AT^ + h)^ + (r^AT^ + h) T^E^jR^, (72b) 

dfjG = 2TjjR^EiiR^ , (72c) 

dfjG = 0. (72d) 

By (72), one can easily show that 



dijfi^) = - ((a,,G)(B^ - zl)-') , 

ay (A) =|tr {{dijG){Bj, - zir\d,jG){B^ - zl)-') ^tr {{df^G){Br, - zl)-') , 



N 



6 



ay {A) = - -tr ((5i,G)(Bjv - zI)-\aijG){BN - zI)-\dijG){BN - ^I)"') 



N 
3 



+ -tr ((4G)(B^ - zI)-i(a,,G)(Bjv - zl)-^) 



N 
3 



+ -tr ((a,,G)(B^ - zI)-H4G)(B^^ - zl)-') . 



(73a) 
(73b) 



(73c) 



Now, we provide a bound for each of the three terms of afjf{A) (see (73c) above). The first term of 
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dfjf{A) can be bounded by 

|tr {idijG){BN - zI)-\dijG){BN - zl)-\dijG){BN - zl)-^) \ 
<\\iidijG)iBN - zI)-^)% \\idijG){BN - zI)-^\\f (74) 

(6) 1 

where (a) follows from l)-i) of Lemma 2 and the remaining two inequalities, (b) and (c), follow from l)-ii) 
and l)-iii) of Lemma 2. By l)-i) and l)-ii) of Lemma 2, the second and third terms of dfjf{A) can be 
bounded by 



|tr ((4G)(Bjv - zI)-\dijG)iBN - zl 
Therefore, to estimate \df,f{A)\, we note that HTj-jR^EjiR^ ||f = TjjRu < ||T||||R|| < and 



^^.3 mjG)\\p mjG)\\p = ^^^||r,-,-R^E,,R^||F ||(a,,-G)||F. (77) 



(a) 



<||R||||T2H^HT2||tr(ej-eJ) 



<||T||||R||||H^H|| < a^ (78) 



where (a) follows from l)-iv) and 2) of Lemma 2, and (b) follows from 3) of Lemma 2. As a result, 

mjG)\\l <4 (||R^Ei,-TA^Ri||^ + ||R^Ei,TiH^||^) 
<4 (tr (K^EijTA^RATBjiR^^ + a^) 

(c) 

<4 (||Rf tr (ejTA^ATe^) + a^) 

= 4(^\RfJ2\kiTejf + a^y (79) 
where (a) follows from the triangle inequality of the Probenius norm and Lemma 7, (b) follows from (78), 
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(c) follows from l)-iv) and 2) of Lemma 2, and represents the ith row vector of A. 

Recalling the definition of A(r, c, s) in (71), when i ^ r, a direct application of Lemma 8 yields 



{l^Te,f"} 



k=l 

' n 



2p- 



\k=l 



When i = r, similarly, we have 



|2p 



. feT^C 



From the definition of A(r, c, s) in (71), we get 



2p-2 

^"^ , if z < r, or i = r and j < c, 
\s\'^P, ii i = r, and j = c, 



^'~^t!p' ' otherwise. 



Then, the above gives the simple bound E{|^ijp*'} < ^ for z 7^ r and j 7^ c. Note that 



E l^^^f ' ^ E l^'^.f = (-JT^-,)^ < llTf ^ < 



k=l 



^k=l 



Prom (82) and (83), we get, when i ^ r, 



jfc=i fc=i ^ ' 



and similarly, when i = r, 



k^c ^ ^ 
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Therefore, using (80) and (81) with the above bounds, we have 



E{|a,Te,f} 



0{^) + \s\^ ifi = r, 
O (^) , otherwise. 



(86) 



which, together with Lemma 8, ensures that 



N 



^|a,Te,f -E{|a,Te,|2} 



N 

< E{||a,Te,p - E {|a,Te,p}|^} = 0(1). (87) 



Combining everything together, we get 



(«)C 



N 



E{\d^J{A{r, c,s))\}<-E\[C' + Y^ |a,Te, 



<^ I C' + \sf + E( 



<^\C' + \s\' + E{ 



N 

^|a,Te,| 



N 



^|aiTe,|2-E{|a,Te,|2} 



< — 
-N 



C + |sP + I E 



V 



TV 



^|a,Te,f -E{|a,Te,f } 



c 



N 



<J^ iC' + \sf), 



(88) 



where (a) follows from (79), (b) follows from Lemma 7, (c) is due to the fact that (E{| • |^})p is a nonde- 
creasing function of p, and (d) follows from (87). 
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Finally, we can evaluate the second and third lines of (23). Using (23) and (88), we have 



|E{K{mB,(z)}}-EMmB,(z)}}| < ^EE N (^^' + 1^1') {J^' ^4 

'^"^.{f'"v.N.(^-.)^.}) 

(89) 

The quantity \^\^{m^^{z)Y) — E{S>{r?7,Bjy (z)}}| also admits the same upper bound. Therefore, we finish 
Step 2. 

Appendix B. Existence and Uniqueness 

In this appendix, we will consider existence and uniqueness of the solution to (11). 
Appendix B.l Existence 

As pointed out in the paragraphs after Theorem 5, Formulas (12a)-(12d) can be obtained from those of 

[29, Theorems 2.4 and 2.5] by recovering the effect from the eigenvectors U^'s. Therefore existence of ei{z) 
follows from that of the corresponding solution '^i{z) of [29, Theorems 2.4]. 

Appendix B.2 Uniqueness 

In fact, uniqueness of ej(z) also follows immediately from that of V'i(^) in [45, Theorems 2.4]. However, 
here we provide an alternative proof, which is inspired by [7, 18]. 

"Note that K{/(A)} is a smooth function and |9?.5R{/(A)}| < |af^/(A)| for each p. 
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For the reader's convenience, we recall the notation introduced in Theorem 1: 



ei(^) = -^tr(Ri*(z)), (90a) 
ei{z) = -tr (Ti{^{z))i) , (90b) 



where 



^{z) = (*(2)"^ - zU^{z)U^^ ^ , (91a) 
*(z) = (*(z)~^ - zH^*(2)h) , (91b) 

^{z) = — (--S + J2ei{z)Ri + lN^ , (91c) 

^ V ^ i=l / 

*(z) = diag (*i(z), . . . , ^k{z)) , (91d) 

with *j(z) = -i(I„, +/3iei(z)Tj)-^ Let = [zei{z) ■ ■ ■ zexiz)]^ , = [zei{.z) ■ ■■zSKiz)]'^, = z^{z), 
= z^{z), = z^i(z), and = z^i{z). To facilitate our notations, we, henceforth, denote by 
* = ^(z), * = ^(z), * = ^(z), * = ^(z), *j = *i(2;). Suppose that {e°{z),e°{z)}i<i<K are another 
solutions satisfying (90) and let e°, e° be the matrices/vectors by replacing the 

entries 61(2;) 's and 61(2;) 's in SE^, ^, e^, with e°(z)'s and e°(2;)'s respectively. We prove 

the uniqueness of and by showing that 62 — e° = and — e° = 0. 

Denote Tj = diag(0„j , . . . , 0„._j , Tj, On^.,.! , . . . , 0„^) for i = 1, . . . , K. To simplify notation, we let 



TV 

U2,j = §tr (R,*H,^,,T,*2Hf , 

/3, / ~ (92) 

= — tr (Xi'i'H^^^Rj^f H*^) . 
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Moreover, let 



r = 



Til 








Tl2 





Til 


kPri2 








r2i 







^pr2i 








^22 



(93) 



with Tn = [rn,ij] e C^x^, ri2 = [Ti2,ij] G C^^^, r2i = [T2i,ij] G c^x^, r22 = [r22,i,] e C^^^, and 

0, 



22,ij 



U2,ij 



, 1 - U2,v. 

0, i = j, 



21,y 



1 — V2 



1 - U2M ' 



1 - V2-, 



Similarly, let r° as well as T^^, r^2, r2i, and be the matrices by replacing *, *, and * with 
w , # , and * respectively. 

Now, write ei{z) = 61^1(2;) +iei^2{z), ei{z) = ei^i{z) +iei^2{z) and z = zi +]z2- A direct calculation then 
yields 

^{zei{z)} = ^> I ^tr (Ri* (z*"^) | 

Itr I Ri* ( zS - ^ Nl'e*(z)R, + ^ H,*.,(^In, + /3,^e,(z)T,)*2H 

V V 3=1 j=l 

"1 ^ /3 

E ei,2(-^)N|'^tr (Ri*R,*^) + E S>{^e,(2)}f tr (R,*H,*,,T,-*2Hf 

i=i i=i 



K 



f - \z\'lr, 1 *^ 



E ej,2iz)\z\'^ui,ij + ^ 9{zej(z)}u2,ij + ^tr (^Rj* (^S 



(94) 
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where (a) and (b) are obtained by expanding (^z^ ^) and (^—z^^j^ respectively using (91), and ( 
obtained by using the definitions in (92). Similarly we can get 



C IS 



K K 

Q{zei{z)} =J2ej,2{^)\^\'vi,ij + J2^{zej{z)}v2,ij + ^tr (T,*H^*,*f H*^) 



. , . , rii 

K K 

ei,2{z) = 9{^ej(z)}ni,^j + ^ eJ,2{z)u2,^J + ^tr (R,**^) , 

3=1 j=l 

~ei,2{z) = ^ ^{ze^{z)}vi,,i + ^ ej,2{z)v2,^j + ^tr ( T,* ( I, + ^H^*,S*f H j # 



H 



(95) 

(96) 
(97) 



Let 



r) = [61,2(2;), • • ■ , eK,2{z),'^ {zei{z)}, . . . , S> {zexiz)}, 61,2(2:), • • ■ eK,2{z), {zei{z)}, . . . , 5> {zexiz)}]' 



By the definition of V in (93), 77 satisfies 



J7 = rr? + b. 



(98) 



where b = [bf b^^b^bj]^ with bi = [j^^], b 



r b-z^i 



b3=[T^], b4 = [^]GC^and 



b2, 



Z2 

N 

Z2 

N 

Z2 

rii 

Z2 

rii 



tr (Rj**^) , 



tr ( Rj* ( S + H*^*f H^) 



tr ( T,;* ( I„ + ^H^*f S*;,h1 *^ 



(99) 



Let 



T = diag (1 - 1X2,11, • • • , 1 - U2,KK, 1 - ^"2,11, • • • , 1 - '"2, 



KK) 



(100) 



Multiplying both sides of (98) by T gives 



Tri = rVr] + Yb. 



(101) 
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For z G C"'", it is observed that the fohowing quantities 



^i,2{z), '^{zei{z)), 6^,2(2), '^{zei{z)), Vi, 
it2,ij, ■y2,ij, Vz,j, 

&2,i, fes.i, &4,i, Vi. 



(102) 



are all positive. For any matrix A = [ajj], we write A > if Ojj > Vz, j. From (102), we have that 77 > 0, 
Tr > 0, and Tb > 0. As a result, we get Yr/ > [the right-hand side of (101)] and since 77 > 0, we 
conclude that 

1 - U2,ii > 0, 1 - V2,ii > 0, Vi (103) 

Now, all the elements of F, r], and b are shown to be positive. Therefore, from (98) and Lemma 13, we 
get p{T) < 1. Similarly, we also have p(r°) < 1. 

A standard computation involving the resolvent identity (Lemma 3) yields 

ei{z) - e°{z) 
= - -^tr (Ri* (*-^ - *°-^) *°) 

^^f^izejiz) - ze°(^))ltr(R,*R,*°) + ^ ^tr (r.*H,(*,, - 

j=i j=i 

= j^{z~e^{z) - ze°(z))ltr (R,*R,-*°) + f;(e,(^) - e°(z))|tr (R,*H,-*,,T,*;.Hf *°) . 



(104) 



where (a) is obtained by expanding (* ^ — ° i) using (91). Similarly, 



ei{z)-el{z) 

K 



K r, K 

-Y^ize^iz) - ze°{z))^tr (t,#T,.*°) + ^(e-(z) - e°(z))-tr (t,*H^*,R,-*°H*°) 



Now, let T = [e^ eiY and r° = [e°^ e°/ e°/Y . Thus we have 



(105) 



T-T° = A(t-T°), 



(106) 
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where 



A = 



An 








A12 





An 


^2Ai2 








A21 


A22 





^^A21 








A22 



with An = [An,i,] G C^x^, A12 = [Ai2,i,] G C^x^, A21 = [A2i,i,] G C^x^, A22 = [A22,i,] G C^^i^ 
and 

0, i = j, 

^ — 1 z J, 

^ 1 - |tr (Ri*H,^,iT,^°iHf *°) ' 

1 - |tr (Ri*Hi^,iTi*°,Hf *°) ' 

|tr (t,#T,#°) 
1 - ^tr (Ti^^H^*;,Ri*°H*°) ' 

0, i = j, 

^ ^tr (t,*H^*,R,*°H*°) 

~ — ^ — * 7^ J 

1 - itr (t,*,H^^*,R,*°H*°) ' 

Let I • I denote the operator taking the absolute values of the input vector or matrix. It follows 

1 _ , 1 

from Lemma 14 that p{ A) < p(|A|). Applying Lemma 16 with A = ^//3j/N Kl ^Hj^^jT? and = 

1 , Q _ 1 

a//3j7/VT? *°R/ , we have a lower bound for the denominator of An^jj by 

(1 - ^tr (R,*H,*,,T,*°,Hf *°)) 
> (1 - |tr (R,*H,*.^T,*SHf *^)) ' (1 - |tr (R,*°^H,*:f T.^^fif *°)) ' (107) 

where the conditions tr(AA-'^) = ^2,11 < 1 and tr(BB^) = ^2 jj < 1 are satisfied by (103). Applying the 



An,ij = 

Al2,ij = 
^21,ij = 

A22,ij = 
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Cauchy-Schwarz inequality to the numerator of we then obtain from (107) 



|Aii,ij| < 



N 



(108) 



RecalUng the definitions of the entries of F, (108) is equivalent to 



Aii,ij| < 



U2,ij 



1 - U2-, 



l-u'. 



2M 



|rii,iih|rn,ii 



(109) 



Likewise we have 



\^i2,ij\<\Ti2,ij\-An^,ij\--, |A2i,y|<|r2M,|2|r°i_,^.|2, and |A22,i,| < |r22,i,-|^ |r^2,i,|^- (110) 



We then conclude from Lemmas 14 and 15 that 



p(|A|)<p([|r,,|i|r°^.|i]) <p(r)ip(ni <i 



(111) 



where the fact that p(r) < 1 and piT°) < 1 is proved before. As pointed out by [7], this contradicts to the 
statement that A has an eigenvalue equal to 1 via (106). Therefore we get e = e° and e = e° if z G C"*". 
If 2; G C~ or z G M~, similar arguments apply and details are omitted here. Theorem 1 is thus proved. 

Appendix C. Proof of Theorem 3 

Recalling (7), we have [29, page 891] 



r(o-^) = ~ '^^n{~^) ] for e 



(112) 



In Appendix C.l, we first show E{VB^(cr^)} — Vn{o-^) ->■ 0; i.e., 



/;(^-ewb.(-.)})^.-/;(^->(*( 



oj)) ) doj ^^^^ 



(113) 
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Next, we show in Appendix C.2 that with the additional assumptions in Remark 2, (113) can be strength- 
ened to almost surely convergence as A/" ^ oo. Finally, in Appendix C.3, we show that h~7f^^ (*(~'^)) dui 
can be written more explicitly as (15). 

Appendix C.l Proof of the convergence of ElVB^icr^)} - VAr((J^) 
By Theorem 2 together with the dominated convergence theorem, we have 

1 



E{mB^(-a;)} 



Let Fn be the probability distribution whose Stieltjes transform is ^tr'^{z). Notice that [29] 
1 



- E{mBjv(-^)} 



< 


4i 










< 









1 



1 1 

LO~N 

1 



UJ \ + UJ 

D 

Also, we notice the following equalities: 



tr(*(-a;)) 



+ 



1 1 



UJ A + a; 



dFNiX) 



+ 



1 

/ XdFNiX) 
Jo 



1^ XdFB^iX) 



1 ^ tr(Tfe)tr(Rfc) 1 ^ ^ 

+ /V tr(HfcHfc ), 



N 



k=l 



nk 



k=l 



(114) 



(115) 



(116) 
(117) 



Hxr^F (\^- ^ v tr(Tfc)tr(Rfe) 1 ^ « 
X ^^^^(^)-iV^ + -^tr(H.H,). 

We will confirm these equalities later. Prom (116) and (117) together with the constraints in (10), we get 

eIJ \dFBj^{X)[ =2K and J XdFN{X) = 2K. (118) 
As a result, (113) follows from the dominated convergence theorem. 
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It remains to check (116) and (117). For (116), a direct calculation yields 



/ AdFB,(A) =-E{tr(HH^)}=-j;E{tr(R,XfeTfeX^)} + -5]tr(HfeHf) (119) 

^ A:=l fc=l 



k=l k=l 



The equality (117) can be obtained by using [29, (C.4)]: 

^ XdFN{\) = Jirn^Re |-j^2 (j^2;^tr (*(jz2)) + l) } • (121) 

The proof of (121) being the right-hand slide of (117) is similar to that in [29, Lemma C.l] and is therefore 
omitted. The proof of (14) is complete. 

Appendix C.2 Proof of Vb^((t^) - Vn((t^) ^ 

Following [7], one can verify (16) under the assumptions of Theorem 4. 
As for Remark 2, similar to (119), write 

/ AdFB,(A) = -^tr(RfcXfcT,X^) + -^trfR|XfcT|Hn +-^tr(Hfc^^^ (122) 
k=l k=l ^ ^ k=\ 



Furthermore, write 



and 



K K N N 

- tr(R,X,T,Xf ) = ^ E E E 4^^k,jT^k^k,i, (123) 



iV ^ \ 1^ 1^ K J jy 

k=l k=l i=l j=l 



K , , ^ K N 



i E tr(R|X,T|Hf ) = ^ E E (124) 



jY ^ V ft K / jy 

k=l k=l j=l 

where R^ = [R^j^] and x|^- is the jth row of X^ and y^j = T^'H.j^H'j^ej. Note that 
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It follows from Lemma 7, Lemma 9 and Lemma 11 that 



4|^|:E4''(-&TA,-i-.r(T,))| j 



^ K N 



< 



Ar2 
C 



k=l i=l 
„ K N 



Ar2 



fe=i 1=1 

K 



(fc)|2 1 



n 



<C"y-^-4tr(R^)-tr(T2). 



(126) 



This, together with Borel-Cantelli's Lemma, implies 



^ V" V" T ^ 1 tr(Tfc)tr(Rfc) g., 

; 1 A — 1 ; 1 



fc=l i=l 



k=l 



(127) 



A direct calculation indicates that 



E < 



K N 



fe=l i^j 



'\ ^ K N 

J fe=i ^ 



K 



<c'j: 



k=l 



--^^tr(R^)— tr(T^). 



(128) 



It follows that 



Similarly, we have 



1 1 tr(R»X.T.Xf ) - 1 1 "^ffffl ^ 0. 

fe=l fe=l 



(129) 



^ K N 

k=i j=i 



K 



<CY1 ^^tr(TfcH^RfeHfc) 



k=l 
K 



<c'y: 



NukN 
1 / 1 



k=l 



.r . ..t'-(Tt)^tr(R^) ( ^tr((HfHfc)2) 



2\ 4 



(130) 



which implies that 



lf;tr(R|X,T|Hf)i^O. 



jt=i 



(131) 



It follows from the generalized dominated convergence theorem, (129), (131), (112) and (115) that Remark 
2 is true. 
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Appendix C.3 Explicit Expression of - ;^tr (*(-a;))) dcu 

In this appendix, we will prove 



or equivalently, 



(132) 



da^ N 



(133) 



The right-hand side of (133) can be reexpressed as 



ltr(*(-a^))-i,=ltr(*(-a^)-(a^I)-) 



- X;e.(-a^)e.(-a2) - ^tr (*(-a2)H*(-a^)H^) , 



(134) 



where (a) is due to the resolvent identity (Lemma 3) and (b) follows merely from the definitions of ei{—a^) 
and ej(— (T^). We then prove that (134) corresponds to the left-hand side of (133). To this end, we define 



1 / ^ 1 ^ \ 

V(a^ K, «) = logdet Ijv + hiKi + + A«iTi)-'Hf + 

V i=l ^ i=l / 



^ K K 

— logdet (I„. + PiKiTi) -a^Y^ niki. (135) 



i=l 



j=l 



Note that V(o-2,e^,e^) = VN{cr'^) with e^, = [eii-a"^)] G and = [ei(-a2)] G M^^'. The derivative of 
VAr((7^) can be expressed as 



aVjv((T^) _ av 



K 



+E 

i=l 



dV 



K 



(<T2,ea,eo-) 



(136) 



It can be checked that 



dv_ 



0, and 



dV 
dki 



(0-2,6^,6^) 



0, for i = l,...,is:. 



(137) 
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Therefore, we have 



gVjv((j^) _ dV 




1 



N 



1=1 / J 1=1 



1 



K 



(138) 



i=l 



which is identical to (134) and hence we complete the proof. 

Appendix D. Mathematical Tools 

In this appendix, we provide some mathematical tools needed in the proof of Appendices A-C. 
Lemma 2 [51] 

1 ) Let A = [Aij] and B be any matrices such that the product AB exists and is a square matrix. Then 

i) |tr(AB)| < ||A||f||B||f, 

a) ||ab||f < ||a||f||b||, 

Hi) ||AB||f < II A||f||B||f, 
iv) \Aij\ < ||A||. 

2) If A is nonnegative definite, we have |tr(AB)| < ||B||tr(A). 

3) Let A be any matrix such that the product AB exists. Then, ||AB|| < ||A||||B||. 

Lemma 3 (Resolvent Identity [29].) For invertible A and B matrices, we have the identity 



Lemma 4 ([50, 0.4.5 and 0.4.6]). Some fundamental equality and inequalities involving the rank are: 



B-^ = -A-i(A-B)B 



-1 
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i) //AG C^^", rank(A) = rank(A 



ii) //AG C^^" and B G C"^'=, rank(AB) < min{rank(A), rank(B)}. 



•'nxk 



in) If A, BgC^^'*, rank(A + B) < rank(A) + rank(B). 



iv) If A, B G C^^'^, rank([A B]) < rank(A) + rank(B). 

Lemma 5 ([33, Theorem A. 44].) Let Ai and A2 be two N xn matrices. IfS andY) be Hermitian matrices 
of orders N x N and n x n, then we have 



sup 

X 



-^S+AiDAf (a^) - -^S+A2DA«(a;) 



< -^rank(Ai - A2). 



Lemma 6 Let S be a Hermitian matrix of order N x N, Ai and A2 be N xn complex matrices, and Bi, 
B2, Ci, C2, and D be any matrices such that BiDCi and B2DC2 exist and are of orders N x n. Then, 



sup 

X 



-^S+(Ai+BiDCi)(Ai+BiDCi)«(3^) " -^S+(A2+B2DC2)(A2+B2DC2)« (^) 



Proof: 



< — (rank (Ai - A2) + rank (Bi - B2) + rank (Ci - C2)) , 



sup 

X 



^S+(Ai+BiDCi){Ai+BiDCi)«(^) ~ -^S+(A2+B2DC2)(A2+B2DC2)« (^) 



< ;^rank ((Ai + BiDCi) - (A2 + B2DC2)) 

(6) 1 

< — (rank (Ai - A2) + rank (BiDCi - B2DC2)) 



AT 



(rank (Ai - A2) + rank ((BiDCi - B2DC1) + (B2DC1 - B2DC2))) 



<— (rank (Ai - A2) + rank (Bi - B2) + rank (Ci - C2)) , 



where (a) fohows from Lemma 5, (b) follows from iii) of Lemma 4, and the last inequality follow from ii) 
and iii) of Lemma 4. □ 
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Lemma 7 For any p>l and real numbers Ui 's, we have 



E 

i=l 



< nP-^Y,\ai\P. 

i=l 



Proof: This lemma follows from a simple application of the Holder's inequality. 



□ 



Lemma 8 (Elementary Inequality [33, page 29].) If the Xi's are independent with zero means, then 



Lemma 9 (Burkholder's Inequality [49] or [33, Lemma 2.12].) Let {Xi\ he a complex martingale difference 
sequence with respect to the increasing a-field {Fi}. Then for p > 1, we have 



{|^4}<C7,E{(5:|X.|f }. 



Lemma 10 Let x = [-^Xj] G he a random vector, where Xi 's are independent complex random 
variables with zero mean and unit variance; and c = [cj] G C''^ he a deterministic vector independent of 
X. Assume that \Xi\ 's are bounded by e^/N with a constant e and WcW^^ are hounded by a constant C for 
p > 1. Then, for any p > 1, we have 



(139) 



E{||xf^'}=0(l). 



(140) 



and for p >2, 



Proof: Wc will frequently use the fact that if \Xi\ < e^, then E{|Xi|3} < ey/N E{\Xi\^} = £^/iV 
and more generally, E{|Xi|P} < (eViv)^ ^ E{|Xi|2} = (^eViv)^ ^ 
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We first prove (139). Using Lemma 8, we have, for p > 1, 



E{|c^xpf} = E 



N 



i=l 



2p' 



2p-2 



N 



vi=l 



NP 



O 



+ c 



N 



Next, we prove (140). For any p > 2, we have 



{||x||2f } = E {|x^x|f } < C (e {|x^x - E {x^x}|^}) + C (E {x^x})^ 



< 



< 



CCp 

~W 

C'Cp 
NP 



' N / N \ 2 \ 

^E{||X.P-E{|X.|^}n+K:E{||X.P-E{|X.P}|^} \+C 



i=l 



N ■ eVN 



2p-2 



+ [N ■ e^N) M + C 



where the first inequahty follows from Lemma 7 and the second inequahty follows from Lemma 8. □ 

Lemma 11 ([33, Lemma B.26].) Let A G c^x^ a nonrandom matrix and x = [Xi] G be a random 
vector of independent entries. Assume that E{Xj} = 0, E{|Xjp} = 1, and E{|Xj|'} < q. Then, for any 

E{|x^Ax-tr(A)|f} < Cj,((c4tr(AA^))^ +C2ptr((AA^)f)) . 
Lemma 12 [51, 0.7.3] Let A he partitioned as 



A = 



All Ai2 
A21 A22 



Invertihility is assumed for any subblock whose inverse is indicated. Then, 



A-i = 



(All - Ai2A22iA2i)-i AriiA2i(A2iAriiAi2 - A22)-^ 
(A2iArMi2 - A22)-^A2iA^/ (A22 - A2iAriiAi2)-i 
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Lemma 13 [7, Lemma 9] // the components of C, x, and b are all positive, then x = Cx + b implies 
p{C) < 1. 

Lemma 14 [50, Theorem 8.1.18] Let A = [Aij] and B = [Bij\ he square matrices. If \Aij\ < B^j, \li,j, 
then p{A) < p{\A\) < p(B). 

Lemma 15 ([51, Lemma 5.7.9] Let A = [Aij] and B = [Bij] be matrices with nonnegative elements. Then 
^ <piA)hp(B)h. 

Lemma 16 Let A and B be any matrices such that AB^ exists and is a squared matrix. //tr(AA^) < 1 
and tr(BB''^) < 1, then 

1 1 - tr(AB^) I > (1 - tr(AA^ )) ^ (l - tr(BB^)) ^ . 
Proof: For real numbers a and b with a, 6 G [0, 1] it is easily shown that 

{1 - a)^l - b)^ < 1 - ^/ab. (141) 
Let a = tr(AA^) and b = tr(BB^). Plugging a and b into (141), we obtain 

(1 - tr( AA^)) 5 (1 - tr(BB^)) ^ < 1 - ^tr(AA^)tr(BB^) < 1 - |tr(AB^) | < 1 1 - tr(AB^) | , 

where the second inequality follows from Lemma 2. □ 

Lemma 17 [29, Proposition 2.2] Let i}{z) G §(M"'") with p being its associated measure carried by M+. We 
have the following results: 

1) i}(z) is analytic on C — M^; 

2) Q mz)} >Q ifQ{z}> 0, and Q {^{z)} < «/ $5 {z} < 0; 

3) Qiz'&iz)} >0 ifQ{z}> Q, andQ{z'&{z)} <0 ifQ{z}< 0; 

4) /^(K+) = lim2^_j.oo -]y -diiy). 
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A^B^. 
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